Hi Luana,
For trimming, I would recommend either trimming uniformly, or not trimming at all (and therefore discarding reads with adaptor sequence). Trimming uniformly would help if you have adaptor, but it is mostly concentrated at the 3’ ends of the reads. However, if you have adaptor sequence throughout, you probably need to discard those reads. If you were to trim variably per read, you will create a region of low coverage at the trim point for every RAD locus that has adaptor sequence in it creating low-quality genotype calls in those regions.
I can’t tell from your message how many of your reads actually have adaptor sequence in them, it is fine to use cutadapt, but process_radtags can also discard reads with adaptor (not trim), and may give you a better idea of how many reads you are losing per sample.
I don’t understand what you are saying here:
> I removed some 1 to 4 pb that fastqc showed were bad. And by logic after this, my sequence was of variable length (1 to 66 media of 52).
This is also hard to interpret:
> My idea was to do the integrated method. For the denovo I needed to use uniformed reads length, so I used process_radtags with t- 60, I lost 42.8% (because of the trimming of the adapters I guess).
Did you specify the adaptors to process_radtags or just trim the data? If you are filtering on adaptor, I would not also trim, since reads with adaptor will be discarded. Anyway, it is hard to reconcile your first process_radtags run, with 33,663 reads lost to quality with your second run, with 79,680,041 reads lost to quality. If you run process_radtags with trimming set to 60, and a bunch of the reads are already less than 60, they will be discarded, which may be why you have 79M reads dropped. Again, I would run process_radtags on the raw data, with the adaptors provided and see what you get.
julian
From:
stacks...@googlegroups.com <stacks...@googlegroups.com> on behalf of Luana Sousa <lusousa....@gmail.com>
Date: Tuesday, September 27, 2022 at 1:41 PM
To: Stacks <stacks...@googlegroups.com>
Subject: [stacks] trimming or not trimming?
Hey Julian and everyone!
I'm working with an Illumina Hiseq2500 single-end, double digest (PstI-MseI) raw sequence from the DART.
I'm in doubt about the trimming of my data. The data was sent to me already demultiplexed, so I just used process_radtags to remove the barcode and low quality reads:
process_radtags -p ./alti_RAW/ -o ./process_radtags/ -b barcodes_alti -e pstI -r -c -q --disable_rad_check
Total Sequences 186032269
Ambiguous Barcodes 0
Low Quality 33663
Ambiguous RAD-Tag 0
Retained Reads 185998606
This way, my read went from 84 to 69pb. (I used --disable_rad_check because I lost almost all reads using it. I guess not all the reads had the cut site. I tried using the two enzymes but they did not work.)
…
I used cutadapat to remove the adapter:
Thank you very much, Julian.
I used cutadapt to trim the adaptators and left the reads with variable lengths. I set the process_radtags to 60 because that was the length of the most reads. The dropped reads were actually all that was trimmed by cutadapt.
Following your advice, I ran process_radtags with the raw:
If it is preferable to remove the reads with adaptor, I will proceed with these data. The tests with the optimization of the denovo_map are calling about 30k r80 SNPs.
Thank you again!