Help Needed with Ion Torrent Sequencing Data Demultiplexing-- Retained Reads 0 0.0%

105 views
Skip to first unread message

Olga Herrera

unread,
Aug 3, 2023, 11:11:22 PM8/3/23
to Stacks
Hello everyone,

I am starting to work with raw data (unique-end, Ion Torrent, 400 bp read-length). Two restriction enzymes, sbfI and aseI, were used. I have the file with the barcodes, and I am trying to demultiplex, but the generated files are empty.

My command line is:

process_radtags -f ./rawdata/rawdata.fq -o ./rawdata/demultiplex/ -b ./rawdata/barcode --renz-1 sbfI --renz-2 aseI -r -c -q

And I am getting this result:

Processing single-end data.
Using Phred+33 encoding for quality scores.
Found 1 input file(s).
Searching for single-end, inlined barcodes.
Loaded 29 barcodes (10bp).
Will attempt to recover barcodes with at most 1 mismatches.
Processing file 1 of 1 [rawdata.fq]
  Processing RAD-Tags...1M...2M...3M...4M...
  4881292 total reads; -1915992 ambiguous barcodes; -2965297 ambiguous RAD-Tags; +19151 recovered; -3 low quality reads; 0 retained reads.
Closing files, flushing buffers...done.

4881292 total sequences
1915992 barcode not found drops (39.3%)
      3 low quality read drops (0.0%)
2965297 RAD cutsite not found drops (60.7%)
      0 retained reads (0.0%)
 

The .log file appears as follows:

BEGIN per_file_raw_read_counts
File Retained Reads Low Quality Barcode Not Found RAD cutsite Not Found Total
rawdata.fq 0 3 1915992 2965297 4881292
END per_file_raw_read_counts

BEGIN total_raw_read_counts
Total Sequences 4881292
Barcode Not Found 1915992 39.3%
Low Quality 3 0.0%
RAD Cutsite Not Found 2965297 60.7%
Retained Reads 0 0.0%
END total_raw_read_counts

BEGIN per_barcode_raw_read_counts
Barcode Filename Total RAD Cutsite Not Found Low Quality Retained Reads Pct Retained Pct Total Reads
CTGCAAGTTC U19697pos 73898 73898 0 0 0.0% 0.0%
TTCGTGATTC U61020 103897 103897 0 0 0.0% 0.0%
TTCCGATAAC U63103 75324 75324 0 0 0.0% 0.0%
TGAGCGGAAC U63105 79666 79666 0 0 0.0% 0.0%
TAGGTGGTTC U63108 77227 77227 0 0 0.0% 0.0%
TCTAACGGAC U63109 101029 101029 0 0 0.0% 0.0%
TTGGAGTGTC U63110 110888 110888 0 0 0.0% 0.0%
TCTAGAGGTC U63111 89469 89469 0 0 0.0% 0.0%
TCTGGATGAC U63112 84921 84921 0 0 0.0% 0.0%
TCTATTCGTC U63113 106976 106976 0 0 0.0% 0.0%
AGGCAATTGC U63114 97696 97696 0 0 0.0% 0.0%
CAGATCCATC V41340 103922 103922 0 0 0.0% 0.0%
TTCGAGACGC U55530 155495 155495 0 0 0.0% 0.0%
TGCCACGAAC U55541 109689 109689 0 0 0.0% 0.0%
AACCTCATTC U55551 142829 142829 0 0 0.0% 0.0%
AACCATCCGC U55536 104223 104221 2 0 0.0% 0.0%
ATCCGGAATC U27140 158846 158846 0 0 0.0% 0.0%
TCGACCACTC U35644 124193 124193 0 0 0.0% 0.0%
TCCAAGCTGC U24996 111826 111825 1 0 0.0% 0.0%
TCTTACACAC U33100 4 4 0 0 0.0% 0.0%
AACAATCGGC U24970 100270 100270 0 0 0.0% 0.0%
TCCACTTCGC CAG01 13 13 0 0 0.0% 0.0%
TTCAATTGGC CAG02 122115 122115 0 0 0.0% 0.0%
CCTACTGGTC CAG03 121899 121899 0 0 0.0% 0.0%
CGATCGGTTC CAG05 105021 105021 0 0 0.0% 0.0%
TCAGGAATAC CAG10 111584 111584 0 0 0.0% 0.0%
TCCGACAAGC MOL03 124968 124968 0 0 0.0% 0.0%
CGGACAGATC MOL04 142517 142517 0 0 0.0% 0.0%
ACGAGTGCGT MOL06 124895 124895 0 0 0.0% 0.0%
END per_barcode_raw_read_counts

BEGIN barcodes_not_recorded
# A list and count of barcodes found that were not specified in the barcodes input file.
Barcode Total
AGCACGAATC 151607
CGAGGTTATC 143751
CTAAGGTAAC 119479
AGCACTGTAG 119178
AGACGCACTC 118086
CTTCCATAAC 115836
TTGGCTGGAC 112945
TAAGGAGAAC 111800
TTGGCATCTC 106509
CGAGAGATAC 101607
TGATACGTCT 100944
CAGAAGGAAC 95595
CATAGTAGTG 93636  ......

It is my first time working with Ion Torrent sequencing data, and I am not sure if this is due to some library construction feature. I would greatly appreciate it if someone could help me.

Catchen, Julian

unread,
Aug 5, 2023, 12:38:23 PM8/5/23
to stacks...@googlegroups.com

Hi Olga,

 

The process_radtags program is searching the raw reads for the barcodes and restriction enzyme cutsite remnant that you specified should be present. These are single-end data? If so, you do not need to specify the second restriction enzyme used.

 

Anyway, the log is telling you that it did not find many of the barcodes you specified should be present, but instead it found a bunch of other barcodes:

 

# A list and count of barcodes found that were not specified in the barcodes input file.
Barcode Total

AGCACGAATC 151607
CGAGGTTATC 143751
CTAAGGTAAC 119479
AGCACTGTAG 119178
AGACGCACTC 118086
CTTCCATAAC 115836
TTGGCTGGAC 112945
TAAGGAGAAC 111800

 

In addition, when it did find the barcode, it did not find the cut site remnant following that in the sequence.

 

So, you will need to determine if your barcode list is correct and compare it with what is present in the raw sequences. Process_radtags expects to see the barcode followed by the restriction enzyme cut site. Whoever designed your adaptors should be able to tell you the exact sequences that should be tehre, sometimes an extra ‘linker’ nucleotide is present or some other thing that you will need to account for in your barcode list.

 

Best,

 

julian

Olga Herrera

unread,
Aug 5, 2023, 2:59:02 PM8/5/23
to Stacks

Hello Julian, thank you for responding. I checked that all the barcode sequences are present in my raw file. I managed to generate the files correctly using the parameter --disable_rad_check; however, I still have a large list of barcode sequences that the program detected but are not in my list. My reads are single end, and the sequences start with the barcode, followed by GAT and the enzyme's cut site, which in my case is TAAT. I tried placing the GATTAAT sequence in the -renz-1 parameter to see if the program recognizes it as the cut site, but the result is the same.

Finally, upon closer inspection, I noticed that these barcode sequences seem to have a shift in the first base. For example, my correct barcode is TTAGTCGGAC, and it correctly retrieves 312449 reads. However, in the list of barcode sequences found by the program, there is the sequence TTAGTCGGACG, which retrieves 950 reads. It is possible that during the adapter trimming, there was an error in some sequences, cutting off the first base of the barcode?

Eloise Cave

unread,
Aug 5, 2023, 6:56:37 PM8/5/23
to stacks...@googlegroups.com
Hi Olga, 

I had a similar situation where the rad cut site couldn’t be found in my read 2. My data was paired end and I found that the 2 buffer bases before my rad cut site for some reason was causing the program to not recognize it. Even the disable rad check flag didn’t work for me.it was also causing the quality of the reads according to  my multi QC report to fail. What I did was trim those buffer bases off from my read 2 using trimmomatic in my raw multiplexed file. I then ran the trimmed file through process_rad tags and it worked fine. I don’t know if some variation of this will work for you but worth looking into it. I tested it out with a subsample of the data first so it runs quicker. 

As for the barcodes. I think you can allow a certain number of mismatches in the process rad tags code for the barcodes. But I’m less familiar with this issue. 
Hope this helps. 
Eloise 
--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/86566959-c62d-4a79-b297-4a2289d76febn%40googlegroups.com.


--
Eloise Cave, M.S.
PhD Candidate
Shark Conservation Research Lab
Florida Institute of Technology

Olga Herrera

unread,
Aug 6, 2023, 8:13:15 AM8/6/23
to Stacks
Thank you, Eloise. I'm going to try that to see if it improves

To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages