Hi Olga,
The process_radtags program is searching the raw reads for the barcodes and restriction enzyme cutsite remnant that you specified should be present. These are single-end data? If so, you do not need to specify the second restriction enzyme used.
Anyway, the log is telling you that it did not find many of the barcodes you specified should be present, but instead it found a bunch of other barcodes:
# A list and count of barcodes found that were not specified in the barcodes input file.
Barcode Total
AGCACGAATC 151607
CGAGGTTATC 143751
CTAAGGTAAC 119479
AGCACTGTAG 119178
AGACGCACTC 118086
CTTCCATAAC 115836
TTGGCTGGAC 112945
TAAGGAGAAC 111800
…
In addition, when it did find the barcode, it did not find the cut site remnant following that in the sequence.
So, you will need to determine if your barcode list is correct and compare it with what is present in the raw sequences. Process_radtags expects to see the barcode followed by the restriction enzyme cut site. Whoever designed your adaptors should be able to tell you the exact sequences that should be tehre, sometimes an extra ‘linker’ nucleotide is present or some other thing that you will need to account for in your barcode list.
Best,
julian
Hello Julian, thank you for responding. I checked that all the barcode sequences are present in my raw file. I managed to generate the files correctly using the parameter --disable_rad_check; however, I still have a large list of barcode sequences that the program detected but are not in my list. My reads are single end, and the sequences start with the barcode, followed by GAT and the enzyme's cut site, which in my case is TAAT. I tried placing the GATTAAT sequence in the -renz-1 parameter to see if the program recognizes it as the cut site, but the result is the same.
Finally, upon closer inspection, I noticed that these barcode sequences seem to have a shift in the first base. For example, my correct barcode is TTAGTCGGAC, and it correctly retrieves 312449 reads. However, in the list of barcode sequences found by the program, there is the sequence TTAGTCGGACG, which retrieves 950 reads. It is possible that during the adapter trimming, there was an error in some sequences, cutting off the first base of the barcode?
--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/86566959-c62d-4a79-b297-4a2289d76febn%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/86566959-c62d-4a79-b297-4a2289d76febn%40googlegroups.com.